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I. INTRODUCTION 

We continue in this paper our work on stochastic processes conditioned on large deviations [1, 2], 
The idea is to describe the evolution of a stochastic process when one or more ‘observables’ of this 
process are observed to fluctuate in time away from their typical values. The approach followed is 
based on large deviation theory and proceeds by conditioning a Markov process Xt on a rare event, 
defined in terms of a functional At of the process integrated over the time interval [0,T], and by 
deriving from this conditioning a new Markov process Xt - called the auxiliary, effective or driven 
process - which can be proved to be equivalent to the conditioned process as T —)• oo. In 

this limit, it is thus possible to describe the trajectories, paths or histories of Xt leading to rare 
fluctuations of At by a conditioning-free Markov process identified as the driven process. 

We have explained in [1, 2] how this conditioning problem relates in physics to generalizations 
of the microcanonical and canonical ensembles defined on paths of nonequilibrium systems, and 
how the driven process is constructed mathematically by a generalization of Doob’s h-transform, 
used in probability theory to describe simple conditionings of Markov processes [3-8] . Physically, 
the driven process can also be seen as a generalization of the notion of fluctuation paths describing 
dynamical fluctuations in low-noise systems in terms of most-likelihood paths (also called escape 
or reactive paths). The theory of these paths, widely used in physics, biology and chemistry to 
describe noise-activated processes [9-12], goes back to Onsager and Machlup [13] and has come to 
be formalized in the 1970-80s as the Freidlin-Wentzell-Graham (FWG) large deviation theory of 
dynamical systems perturbed by noise [14-17]. Our approach generalizes this theory in that it is 
not restricted to low-noise or low-temperature systems: it can be applied in principle to any Markov 
system driven arbitrarily far from equilibrium by external forces, boundary reservoirs, and noise 
sources to describe, by means of an effective stochastic process rather than a single deterministic 
path, how fluctuations arise in time. 

Other links between the driven process, control theory, rare event simulations, and the physics 
of nonequilibrium systems are mentioned in [2]. Our goal here is to provide more details about the 
link with control theory, and to show in particular that the driven process can be interpreted as 
an optimal stochastic control process minimizing a cost function related to the large deviations 
of the conditioning observable At- We also show, as a prelude to this control result, that the 
driven process can be characterized by a number of equivalent variational principles involving large 
deviation functions and relative entropies. 

These principles follow, as most variational principles of statistical mechanics [17-19], from the 
so-called contraction principle of large deviation theory and describe the fact that nonequilibrium 
systems ‘build up’ fluctuations in an optimal way to reach states far from their typical states. This 
is very much in the spirit of the Onsager-Machlup principle of minimal dissipation [13] and of 
generalizations of this principle forming the basis of the FWG theory. The crucial difference again 
is that the principles that we derive can be applied to weak- and strong-noise systems, in addition 
to equilibrium and nonequilibrium systems. 

As in our previous work [1, 2], the results that we discuss relate to and contain many results 
obtained before, but also clarify, unify, and generalize them, we believe, in many ways. We briefly 
discuss these relations and generalizations next; more specific references and explanations will be 
given in the following sections, especially in Sec. IV and the conclusion section V. 


A. Control approaches to large deviations 

Our main source for this paper is the work of Wendell H. Fleming and his collaborators [20-30] 
on transforming linear partial differential equations (PDFs) arising in large deviation theory into 
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nonlinear PDEs that have the form of Hamilton-Jacobi-Bellman (HJB) equations arising in control 
theory; see also [31-33]. From this control mapping, which is essentially a Hopf-Cole transform 
referred to as a logarithmic transform by Fleming, one is able to obtain large deviation functions 
by solving HJB equations using control methods. This can be applied either in the low-noise or 
the long-time limit of large deviations, the later being the subject of the Donsker-Varadhan (DV) 
theory [34-36]. 

We have explained this approach briefly in [2]; see also [37-39] for reviews and [40] for another 
useful summary. The connection with our work comes from the fact that the optimal control process 
obtained in this approach corresponds to the driven process in the case of time-integrated observables 
studied in the long-time DV limit. To our knowledge this connection was not made before. It is 
known from the work of Fleming and Sheu [23—25] that the controlled process corresponds to a 
change of measure of the original process considered, but no link was established between this 
change of measure, the generalized Doob transform, and the conditioning problem. 

These connections are established in Sec. IV and complement Fleming’s theory by showing that 
the optimal process solving a large class of stochastic HJB equations with quadratic cost corresponds 
to a large deviation conditioning of a non-controlled process. This interpretation applies to low-noise 
(FWG) large deviation problems, but also to long-time (DV) large deviation problems for systems 
that are not necessarily perturbed by a weak noise. In addition, we consider, as explained next, 
the control mapping for a significantly wider class of control costs and observables suggested by 
nonequilibrium systems. 


B. Stochastic control with current-type costs 

Historically, optimal control theory and dynamic programming [41-45] have been developed for 
cost functionals of deterministic and stochastic systems having the form 

= l%{Xs)ds + f{XT), (I) 

where if and f are arbitrary functions of the state variable Xg and t < T. Applications in 
nonequilibrium statistical physics require that we consider more general costs that involve not only 
an integral of Xg, but also a sum that depends on the jumps of Xg, in the case of a jump process, 
or an integral over its increments, in the case of a pure diffusion. 

A generalization of stochastic control theory to these costs, which are related physically to 
particle and energy currents, has been proposed recently by Bierkens, Chernyak, Chertkov and 
Happen [46, 47]. Their approach follows that of Happen [48-50] which is itself a reversal of Fleming’s 
approach: they derive the HJB equation for the solution of a special stochastic optimization problem 
involving a quadratic cost and then apply a logarithmic transform to this equation to obtain a 
linear PDF which can be solved using spectral or path integral methods. 

The present paper can be seen as an alternative approach to this generalization of stochastic 
control and optimization, providing new proofs of the HJB equation for current-type costs. In fact, 
we provide in Appendix A 6 a simple proof of this equation, which relies only on Ito’s calculus. Our 
results also provide, as a complement to Fleming’s approach, a probabilistic interpretation of the 
optimal control process and relate quadratic costs to large deviations. This is the subject of Sec. IV. 


C. Spectral characterizations of positive operators 

An interesting consequence of the control approach to large deviations is that it provides an 
interpretation of some variational characterizations of the dominant eigenvalue of linear positive 
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operators ~ in particular, the Rayleigh-Ritz variational principle of quantum mechanics [51], as well 
as generalizations of this principle obtained for non-hermitian operators by Donsker and Varadhan 
[52-54] and by Holland [55, 56]. A control approach to dominant eigenvalues was also proposed by 
Fleming and Sheu for general additive costs [23, 25, 30] (see [38] for a review) and is mentioned in 
the context of current-type costs by Bierkens, Chernyak, Chertkov and Kappen [46, 47]. 

Our work can be seen as yet another approach to this problem in which the DV characterization 
of dominant eigenvalues follows as a special case of more general variational principles that we 
derive from the contraction principle as applied to the so-called level 2.5 of large deviations [57-61]. 
The optimizer of these variational principles is the driven process, as will be shown in Sec. Ill, so 
that these principles can be used to characterize that process independently of its relationship with 
the optimal control process and the conditioning problem. 


D. Variational methods for computing large deviations 

The DV characterization of dominant eigenvalues is useful not only from a spectral point of 
view, but also as a computing tool for obtaining large deviation functions. One important function 
of large deviation theory, the scaled cumulant generating function, is known to correspond for many 
observables of Markov processes of interest to the dominant eigenvalue of a positive linear operator 
called the tilted generator. Having a variational representation for this eigenvalue allows one to 
obtain useful approximations and bounds to the scaled cumulant generating function. 

Variational principles also exist for another large deviation function, the rate function, which 
is the main function of interest in large deviation theory. The DV variational formula for the 
level-2 rate function [34-36] is one such principle, as is a variant of that principle derived by Baldi 
and Piccioni [62] for finite-space jump processes via the level 2.5 of large deviations. In physics, 
variational principles for large deviation functions have also been proposed for DV large deviation 
problems by Eyink [63-65], who refers to them as action principles related to the Rayleigh-Ritz 
method, by Nemoto and Sasa [66-68] (see also [69]) for special observables of one-dimensional 
diffusions, and by Jack and Sollich [70] for jump processes. 

The results that we derive in Sec. HI include all of these variational principles and show that 
the optimizer of these principles is the driven process. This provides a clearer understanding of 
the work of Nemoto and Sasa [66-68] on feedback control methods for estimating large deviation 
functions. We briefly discuss these methods at the end of the paper in the context of control theory 
and adaptive importance sampling. 


E. Path maximum entropy and maximum caliber 

A final link exists between our work and the maximum entropy approach to nonequilibrium 
systems proposed in the 1960s by Filyukov and Karpov [71-73], and re-worked recently by Evans [74- 
76] for sheared fluids. The basis of this approach, also known as the dynamical maximum entropy 
or maximum caliber method [77-79], is to describe the stochastic dynamics of a nonequilibrium 
process driven in a steady state by a path distribution maximizing the Shannon entropy subject 
to a constraint (e.g., current or shear state) describing the steady state. For Markov chains and 
jump processes, Monthus [80] has shown that this maximization yields the driven process, when 
expressed more generally as a constrained minimization of a path relative entropy. In Sec. HI, we 
generalize this result to general Markov processes and relate the relative entropy to the level 2.5 
of large deviations. This establishes new connections between the dynamical maximum entropy 
method and all the topics mentioned before, in addition to provide a probabilistic justification of 
this otherwise ad hoc method. 
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II. THEORY OF CONDITIONED AND DRIVEN PROCESSES 

We review in this section the construction and physical meaning of the conditioned and driven 
Markov processes. We follow closely the notations of [2], but restrict ourselves to the case of 
pure diffusions to simplify and shorten the presentation. Jump processes and Markov chains are 
considered in Appendices B and C, respectively. Mixed or hybrid processes combining diffusive and 
jump parts can also be treated using the general language of Markov generators; see [2]. 


A. Conditioned process 

We consider an ergodic Markov process Xt evolving over a time interval [0,T]. This process is 
defined mathematically by its generator L, whose action on functions h of Xt gives the evolution of 
their expectation according to 

atE,[/i(At)] =E,[L/i(W)], (2) 

where E 3 ;[-] denotes the expectation with fixed state Xt = x. From this relation, it can be shown 
that L determines the transition probability kernel 

= (3) 

associated with the transition from Xg = x to Xt = y with s < t. Its dual determines via the 
master equation 

dtp{x,t) = p{x,t) (4) 

the evolution of probability densities (or measures in general). 

Another way to define Xt is to specify its path measure which corresponds intuitively 

to the distribution of its paths {W(cu)}^q. This measure can be defined via finite-dimensional 
distributions (see [2]) and depends on L and T, in addition to the initial density po{x) = p(x,0). 
For simplicity, we assume that all paths start at Xq = xq, so that po = dxQ- 

We focus as mentioned on pure diffusions defined by the following (Stratonovich) stochastic 
differential equation (SDE): 

dXt = FiXt)dt + aiXt) o dWt, Xt G (5) 

where F : —)> is the drift, a : —)> is the diffusion field, and Wt G M is a Brownian 

motion.^ For this model, the generator is given by 

L = F -X + ^{a -Xf = F-X + ^VDV, (6) 

where 

F{x) = F{x)-^{X ■a){x)a{x) (7) 

is the modified drift and D = is the covariance matrix? The master equation (4) is also in 
this case the Fokker-Planck equation, which can be expressed as the continuity equation 

dtp = —V • Jf,p, (8) 


^ See [2] for general SDEs involving more than one Brownian motion. 

^ As in [2], we consider the Stratonovich interpretation of the SDE only for convenience; other interpretations can be 
considered with appropriate changes. For a diffusion matrix cr that does not depend on Xt, F = F. 
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FIG. 1. Illustration of the conditioned process. Black: Paths of a process Xt leading to typical values of the 
observable At having high probability (gray shaded area). Red: Paths of Xt leading to atypical values of 
At having low probability (red shaded area). Each set of paths defines a conditioned process Xt\AT = a 
characterized by the conditional or microcanonical path measure (11). The driven process is the homogeneous 
Markov process describing the conditioned process in the long-time limit T —> oo. 


where 


Jf,p = Fp- y Vp (9) 

is the Fokker-Planck probability current associated with the drift F and density p [81]. The invariant 
density p’^''(x) of the process satisfies p^p'' = 0 or 

^ — 0 ( 10 ) 

and corresponds, under our assumption that Xt is ergodic, to its unique stationary density. 

Physically, we imagine Xt to be the state of a stochastic system involving one or many particles 
that are either at equilibrium or are forced into a nonequilibrium steady state by boundary reservoirs 
or external forces violating detailed balance. As the system evolves randomly for t G [0, T], we 
are interested in tracking a certain observable Ap^ representing, for example, the work done on 
the system by an external force or the heat exchanged with its environment, and in studying the 
system’s paths leading to rare values of Ap that have a low probability of being observed after a 
long time T. This means, following the introduction, that we want to study the behavior of Xt 
given that Ap is observed to be far from its typical value after a long time T. 

This conditioning of Xt is illustrated in Fig. 1 and is defined probabilistically by the conditional 
path measure 


dP“^™(cn) = dFL^p{oj\Ap = a} 


dWppp\uiAp = a} 
^l,t{Ap = a} 


( 11 ) 


The superscript ‘micro’ refers to the fact that this conditional measure is a path analog of the 
microcanonical ensemble representing in equilibrium statistical mechanics the probability measure 
of a many-particle system conditioned to have a constant energy. In the following, the conditioned 
process defined by (11) is denoted by Xt\Ap = a. 

An obvious question to ask about Xt\Ap = a is whether this process is Markovian - that is, 
whether it can be described by a Markov generator and, in the case of diffusions, by an SDE that 
depends on the constraint Ap = a. To our knowledge, this is not the case for general observables 
with T < oo because of the non-local (in time) nature of the constraint Ap = a, even if we allow for 
non-homogeneous (i.e., time-dependent) generators. However, as we prove in [2], the conditioned 
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process does converge in the limit T ^ oo to a homogeneous Markov process, corresponding to the 
driven process Xt defined in the next subsection. 

This result is derived in [2] for a large class of observables At suggested by physical applications 
that depend on the state of the process Xt and its jumps or increments. For pure diffusions, these 
observables have the general form 

= f{Xt)dt + g{Xt) o dXt, (12) 

where f is a scalar function, g is a vector function, and o denotes the Stratonovich product.^ The 
driven process is also obtained by assuming that the probability distribution of At satisfies a large 
deviation prineiple (LDP), which means essentially that 

^l,t{At = a} PS (13) 

in the limit T —>• oo with subexponential corrections in T. 

This scaling result is found for many observables of nonequilibrium systems [17, 82-84] and 
implies that the probability of At decays exponentially with T, except at the global minimum and 
zero a* of the rate function I (a) where it concentrates with T [17, 18, 85]. Hence, values a ^ a* 
represent fluctuations of At that are exponentially rare with the observation time T, whereas a* 
itself represents the stationary or ergodic value of At which becomes most probable as T —)• oo. In 
mathematical terms, this concentration of probability defines a (weak) law of large numbers, which 
we express here as 

At —)■ a , (14) 

T 

where —T stands for the convergence in probability with respect to Pl,t in the limit T —>■ oo. This 
notation is important - it will be used later with other observables and path measures. 


B. Driven process 

The dehnition of the driven process Xt involves various large deviation elements related to At- 
The first is the sealed cumulant generating function (SCGF) of At defined as 

Afc = lim i In Ep^ ,, [e^^^^], (15) 

1 —^■OO 1 

where A: G M and the expectation is taken with respect to the process Xt with path measure El,t- 
The second element that we need is the tilted generator, given by 

= F • (V + %) + (V + kg)^{V + kg) + kf, (16) 

where / and g are the two functions entering in the definition of At and D is the diffusion matrix. 
This linear differential operator is essentially the generator of the evolution of the generating 
function of At, obtained by combining Girsanov’s Theorem and the Feynman-Kac formula; see 
Sec. 3.1 and Appendix A.2 of [2]. It plays a central role in large deviation theory, as its dominant 
(Perron-Frobenius) eigenvalue coincides with the SGGF of At if we assume that the spectrum of 
Ck has a gap; see Sec. 3.2 of [2]. Under this assumption, we then have 

AfcT/j. (17) 
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The Stratonovich convention is also used here for convenience. 
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where is the ‘right’ eigenfunction associated with the dominant eigenvalue and SCGF A^. For 
the remaining, we also need the associated dual or ‘left’ eigenfunction given by 


— ^klk- 

These functions are normalized according to 


j lk{x)dx = 1, 


j lk{x)rk{x)dx = 1. 


(18) 


(19) 


With these elements, we define the driven process Xt as the Markov process with generator Lp^, 
acting on functions h according to 

= r^^(£fcrfc/i) - r^^{£krk)h = r'^^{£.krkh) - Afch, (20) 

where (CkCkh) means that Ck is acting on the product r^/i. This transform of Ck is a generalization 
of Doob’s h-transform [3-5] which defines, as shown in [2], a homogeneous Markov process with 
path measure 

dFpp t ( w ) = ( 21 ) 

rk{xo) 

compared to the path measure of XtA The effect of this transform for diffusions [2] is to change 
only the drift F of Xt to the driven drift 


Fk = F + D{kg + X\nrk), (22) 

so that Xt satisfies the SDE 

dXt = Fk{Xt)dt + a{Xt) o dWf (23) 

This explains the subscript Fk in the generator of Xf 

The convergence of the driven process to the conditioned process Xt\AT = a follows from these 
results by assuming that (i) Ap satisfies an LDP, (ii) the spectrum of Ck is gapped, and (iii) the 
rate function I {a) is convex.® Under these hypotheses, it can be shown that 


lim 

T —>-00 



dF 


micro 

a,T 



(cj) = 0 


(24) 


for almost all path {Xt{uj)}J^Q with respect to Pa,T or if we choose k = I'{a). This limit 

establishes a form of process equivalence whereby 


P™(da;)«Pi^^,T(da;) (25) 

with subexponential corrections in T, as in the expression (13) of the LDP, so that these two 
measures are equal on a logarithmic scale. In this sense, we say that the conditioned and driven 
processes become asymptotically equivalent in the limit T —)• oo for k such that k = I'[a). This 
holds again if I (a) is convex; if I (a) is nonconvex, then there is no driven process that is equivalent 
to the conditioned process; see Sec. 5 of [2] for more detail.® 


* We emphasize that this transform is a generalization of the Doob transform, since it relates Lf^ and L via the 
non-conservative Markov generator Ck- The resulting generator Lf,. does however conserve probabilities. 

® There is a further technical assumption, namely, that the large deviations of At do not arise as a boundary effect 
in time; see Secs. 5.2 and 5.3 of [2]. 

® We assume for simplicity that 1(a) is differentiable. If 1(a) is convex but not differentiable, then k G dl(a) where 
dl(a) is the subdifferential of I(a)\ see [86] and Appendix 1 of [87]. 
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This process equivalence is a direct generalization of the equivalence of the microcanonical and 
canonical ensembles of equilibrium statistical mechanics [87]. In fact, we prove the limit (24) in [2] 
in two steps using the following path generalization of the canonical ensemble: 

^TkAr 

rdPL,T(a;), (26) 






also known as the tilted path measure or s-ensemble [88—91]. First, we prove that the canonical 
path measure is described by a non-homogeneous (viz., time-dependent) Markov process which 
converges as T —)• oo to a homogeneous Markov measure with generator Second, we use known 
equivalence results for the canonical and microcanonical ensembles [87] to establish a limit similar 
to (24) for the canonical and microcanonical path measures, which holds if I {a) is convex and 
k = I'{a). The limit (24) then follows using the chain rule for Radon-Nikodym derivatives: 


^pm^ro 

dFr,„_ T 


^pm^ro 


Dcano 

• k,T 






(27) 


Physically, the equivalence of the conditioned and driven processes also means that these two 
different processes have the same typical states in the long-time limit. For equilibrium systems, it is 
known that observables in the microcanonical and canonical ensembles can have different fluctuations, 
but have the same equilibrium (viz. typical) values in the infinite-volume or thermodynamic limit 
when the microcanonical entropy is concave as a function of energy [86, 87]. Similarly, it can be 
shown that time-integrated observables such as At have in general different fluctuations with respect 
to the driven and conditioned processes, but have the same ergodic (viz. typical) values in the limit 
T —> oo when 1(a) is convex [2]. Thus, although the conditioned process might not be Markov for 
T < oo, it can be described in the ergodic limit by an effective Markov process - the driven process 
Xt - having the same typical values of observables. In particular, the ‘hard’ constraint At = a of 
the conditioned process is achieved in the driven process in a ‘soft’ way via the limit 


At 



a. 


(28) 


This equivalence is proved in [2] for general observables, and holds in particular for two observables 
of mathematical and physical interest, namely, the empirical density 

Pt{x) = ^ j - x)dt, (29) 

which represents the fraction of time spent at x, and the empirical current 

Jt{x) = ^ f -x)o dXt, (30) 

J Jo 

which is a time-averaged local ‘velocity’. For ergodic processes, it is known that pT converges 
in probability to the stationary density, whereas Jt converges in probability to the stationary 
Fokker-Planck current [57]. For the driven process, the stationary density is [2] 


Consequently, 


PF^ix) = rk{x)lk{x). 






(31) 


(32) 


for that process. The point of equivalence is that the same limits hold for the conditioned process 
when I{a) is convex and k and a are related by A; = I'{a). In this case, the driven and conditioned 
processes have the same ergodic density and ergodic probability current. This is important for the 
following. 
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III. VARIATIONAL REPRESENTATIONS 

The driven process is intuitively the conditioning-free Markov process that realizes the conditioned 
process in the long-time limit. In this section, we propose three other ways to characterize this 
process using variational principles involving the SCGF of the conditioning observable, its rate 
function, and the relative entropy. These principles follow from general and simple large deviation 
arguments based on the contraction principle, which we first explain before deriving our main 
results. Some of these results were discussed in the literature, as mentioned in the introduction; the 
contribution of our work is to unify and to generalize them within the framework of large deviation 
conditioning and to show that their solutions give the driven process. 


A. Main ideas 

It is known from large deviation theory that the SCGF and rate function of At can be obtained 
from ‘higher’ random variables Bt satisfying two properties: 

1. Bt has an LDP with rate function K[h). In our context, Bt is an observable of the paths of 

Xt, as for At, so its LDP is also defined with respect to the path measure of Xt. 

2. At can be written as a function of Bt'- that is, there exists a function A such that 

At{uj) = A(^Bt{oj)) (33) 

for all paths {Xt{uj)}]LQ.‘ In this case, we say that the observable At admits a representation 
or contraction in terms of Bt- 

Under these assumptions, the following principles and equivalence result apply [17, 85, 86]: 

• Contraction principle: 

I {a) = mf K{b)- (34) 

b:A{b)=a 

The solution 6“ of this constrained minimization® corresponds to the stationary value of Bt 
on which P{Bt = b\AT = a} concentrates exponentially as T —)• oo, so that 

pmicro 

Bt ^ bP (35) 

This interpretation of 6“ follows by deriving from the two properties above an explicit rate 

function for P{Bt = b\AT = a}; see [86], Sec. 5.3.2 of [17], and [87]. 

• Laplace principle: 

Afc = sup{/cA(6) — K{b)'\- (36) 

b 

This is the Lagrange multiplier or dual version of the contraction principle. Its solution 5^, 
parameterized by k, corresponds to the typical value of Bt under the canonical path measure 
(26), which means 

pcano 

Bt ^bk', (37) 

see [86] and Sec. 5.4 of [17]. Since the canonical measure is equivalent to the path measure of 
the driven process, the limit above also holds for see Sec. 5.2 of [2] for more detail. 

^ The equality in this assumption can be weakened to \At{uj) — A{BTbjj))\ = o(l) in T as T —>■ oo; see [86]. 

® We assume for simplicity that the solution is unique; see [86, 87[ for more detail about the case where more than 
one minimizers exist. 
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• Equivalence: The contraction and Laplace principles have the same solutions for convex 
rate functions. More precisely, if I (a) is convex at a, then 6“ = for k = I'(a). This follows 
from properties of Legendre-Fenchel transforms [86, 87] and is the basis of the equivalence 
between the conditioned and driven processes mentioned before [2], 


The idea for the rest of this section is to apply these results using special observables Bt that 
can be related in a one-to-one way with path measures of Markov processes. In this case, the 
solutions of the variational problems (34) and (36) can be put in correspondence with a Markov 
process that turns out to be the driven process. Intuitively, this process can thus be interpreted as 
the unique Markov process that ‘minimizes’ (34) and ‘maximizes’ (36) via Bt- 

Contractions that can be used to derive these correspondences depend on the process and 
observable considered. For the class of Markov observables At defined in (12), there is a simple 
and general contraction involving the empirical density pT and empirical current Jt-^ Both are 
indeed known to satisfy, when considered jointly as Bt = {pT, Jt), an LDP with rate function 


K{p,j) 


\ j[jix) - - JF,pi^)]dx if V • j = 0 

oo otherwise. 


Moreover, we have 


A{pt, Jt) 


f{x)pT{x)dx + 


g{x) ■ JT{x)dx, 


(38) 


(39) 


so that At is a contraction of Bt = {pt, Jt)- 

This choice of Bt defines in large deviation theory the level 2-5 of large deviations or level-2-5 
LDP [57-61]. This is a natural large deviation level to consider for Markov diffusions, since the 
Radon-Nikodym derivative of any two homogeneous diffusions can be expressed exactly as a function 
of pt and Jt- This means essentially that these two random variables are sufficient to define a 
Markov process uniquely, which is the property needed to establish a relation between Bt and the 
driven process. 


B. SCGF 


We start by deriving variational representations for based on (36). Using the rate function 
(38) and the representation function (39) for the joint observable {pT, Jt), we first obtain 


Afc = sup 
pj 


{kA{p,j) -K{p,j)'^ - 


(40) 


The maximization is performed over all normalized densities, f p{x)dx = 1, and currents j satisfying 
the ‘sourceless’ condition V • j = 0 entering in the expression of K{p,j)- The link with the driven 
process is established by noting that the solution {p*,j*) of this maximization is 


_ „mv 

P - Pf.t 


f = J. 


Fk,P'f 


,inv , 
k 


(41) 


where is the modified drift of (22). This solution is derived explicitly in Appendix Al using 
Lagrange multipliers and can also be obtained in a simpler way by noticing, following our statement 


Another general contraction can be built from the so-called empirical process, which is an abstract infinite-dimensional 
generalization of the empirical density pr defining in large deviation theory the leveTS LDP [18, 92, 93]. 
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of the Laplace principle (36), that [p*,j*) corresponds to the ergodic value of {pT, Jt) hr the driven 
process. Since we know from the previous section that this process is such that 


{PT, Jt) 




(42) 


we must therefore have (41). 

Mathematically, the representation (40) can be seen as a generalization of the spectral character¬ 
ization of positive operators obtained by Donsker and Varadhan [52-54], following Kac’s derivation 
of a probabilistic formula for the smallest eigenvalue of Schrodinger operators [94]. Variants of the 
DV characterization were also obtained by Holland [55, 56]. As shown in Appendix A 2, the DV 
result is recovered for g = Ohy direct minimization over j, leaving in (40) only the empirical density 
whose large deviations are referred to as the level-2 of large deviations. Moreover, as mentioned 
in [52] and shown here in Appendix A 2, the resulting variational characterization of A^ further 
reduces to the Rayleigh-Ritz principle [51], commonly used in quantum mechanics, in the case 
where L = with respect to the Lebesgue measure. 

The relation between the variational principle (40) and the driven process can be made more 
explicit using the fact that p and j uniquely determines the drift. To this end, let us rewrite the 
maximization in (40) by expressing the current fluctuation j as j = Ju,p, where Ju,p is the stationary 
current (9) associated with a ‘free’ drift F = u and diffusion matrix D. The constraint V ■ j = 0 
implies that p is the invariant density of a process with drift u and diffusion D. Changing the 
variables (p, j) —)■ {p,u) then leads to 


Afe = sup J„,pinv) - J^.,pinv)} 


(43) 


where 

Ju,p^-) = \JHx) - F{x)]D-^{x)[u{x) - F{x)]p^^^{x) dx (44) 

is the level-2.5 rate function (38) expressed via (9) in terms of the drift u. Moreover, because of 
(42) and the change of variables to u, the maximizer is now u* = F^.^^ 

This representation of the SCGF is a level-2.5 generalization of a control result of Fleming [28] 
discussed in more detail in Sec. IV. R also generalizes previous results obtained for 1-d diffusions, 
which are characterized by a constant current because of the sourceless condition V • j = 0. In 
particular, (43) generalizes Eq. (4.9) of Sughiyama and Ohzeki [69] who consider the special 
observable At = Xt/T, obtained here with / = 0 and g = 1. Additionally, it recovers Eq. (12) of 
Nemoto and Sasa [67] (see also [66]) who consider the same observable for an overdamped Langevin 
equation on a ring of circumference L, driven by a constant drive /, a periodic force derived from a 
potential U, and a heat bath (Gaussian white noise) with inverse temperature [3. Their main result 
for this model, written in our notations D = 2/{l5y) and F = {f — U')ly, is 


Afc 


-^inf 
4 u 


dx 


Pu/'y+ri^ 


u{xy 


-2T 




u{x) 


with the minimization constraint 


u{x)dx 


2kL 

~T' 


(45) 


(46) 


10 


Unlike (p*, 7 *), u* cannot be interpreted as a ‘most probable drift’ - it is simply the drift that makes the solution 
(y,i‘) of (40) typical. 
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This follows from our result (43) by setting to a constant, which yields 


Aj, = — — inf 


dxp^^J^^pix) 


u(x)^ AkL ^ 

~ '^^^/'y+^’Puf-y+F 


(47) 


after the change of variables n —)> u/y + T, and by noting that the minimizer 


u* = l{Fk - F) = ^[k + (Inrfc)'] 


(48) 


satisfies the integral constraint (46) for all parameters. Therefore, we do not change the solution of 
the minimization in (47) by considering drifts u for which (46) holds. Using this relation in (47) to 
replace part of the factor in front of the current by an integral and inserting the constant current 
inside that integral then yields (45). 

This derivation shows that the constraint (46) is not fundamental: the variational problem to 
solve for general processes and observables involves, as shown in (43), only a maximization over u 
which can be interpreted as a control drift, as explained in the next section. For another study of 
the ring model focusing on current conditioning, see [1]. 


C. Rate function 

Representations of the driven process can be obtained for the rate function I {a) in a dual way 
from the variational representation (34). Using the explicit rate function (38) and the contraction 
(39) for {pT, Jt) in (34) yields 


I{a)= inf K{p,j). (49) 

If I {a) is convex, the equivalence between the conditioned and driven processes implies that the 
solution of the minimization above is the same as the solution of the maximization (40) giving A^: 
that is, p* = pp'' and j* = Jp inv, with k chosen so that 

k 

or equivalently k = I'{a). This recovers the equivalence result mentioned at the end of Sec. II about 
the typical values px and Jt being the same in the conditioned process with At = a and the driven 
process. A different proof of this result, which mimics the proof of Appendix A 1, follows by solving 
(49) with a Lagrange multiplier and by relating this multiplier to the constraint A{p,j) = a. 

Similarly to (43), we can re-express the minimization in (49) for a fixed diffusion matrix D by 
the drift u to obtain 


I{a)= _ inf iL(p“'', J„_^inv) (51) 

u:A{pu,J^^^in^)=a 

with A:(p“'', Ju, pinv ) shown in (44). As before, the minimizer is u* = with k = I'{a) if the rate 
function I {a) is convex at a. The interpretation of this result is that the driven process is the 
controlled Markov process which maximizes the probability of [px^ Jt) under the constraint Ax = a. 
The jump process versions of these results are discussed in Appendix B. 
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D. Relative entropy 


The variational representations derived before can be re-expressed at the more fundamental 
level of path measures using the notion of relative entropy. We recall that, given two probability 
measures P and Q, the relative entropy of P with respect to Q is dehned as 


*5(P||Q) = 


dP{(jj) In 


(W 

dQ 


(uj) = Ep 



(52) 


where dP/dQ denotes the Radon-Nikodym derivative of P with respect to Q. This quantity is such 
5(P||(5) > 0 with equality if and only ii P = Q almost everywhere. As a result, it is often used as 
a distance between probability measures, called the Kullback-Leibler distance, even though it is not 
symmetric and does not satisfy the triangle identity [95]. 

We state next the variational representations of and I (a) obtained with the relative entropy 
and then discuss their meaning, proofs, and equivalence with the previous representations. 

The first representation for the SCGF is 


Afc 


lim sup<^ j,[At] 
T—)-oo u I 




(53) 


and involves the relative entropy between the path measure El„,t of a diffusion with drift u and 
the path measure of the original diffusion with drift As in (43), the solution of the 

maximization is u* = Fk, the drift of the driven process. 

The second representation is the dual of the formula above: 

1(a) = lim inf (54) 


The ‘optimal’ drift that solves this constrained minimization is u* = Fk as in (51) with k = I'{a) 
for 1(a) convex. This result is interesting physically - it shows that the driven process is the 
homogeneous Markov process closest to Xt, in the sense of relative entropy, that satisfies the 
constraint Ep^^ ^[At] = a or, equivalently, that makes At = a typical in the long-time limit. 

This way of expressing a rate function via a change of measure that transforms the fluctuation 
At = a for Xt into a typical event for a modified process is very common in large deviation theory: 
it is basis of the so-called tilting method (see Appendix C.2 of [17] and Sec. 9.6 of [96]) and the 
weak-convergence approach to large deviations proposed by Dupuis and Ellis [97] for discrete-time 
processes. Conceptually, the relative entropy representations (53) and (54) can also be seen as a 
contraction of the level 3 of large deviations mentioned in the footnote 9. For an explanation of this 
level in the simple case of independent random variables, see Sec. IL5 of [93]; for continuous-time 
processes, see Exercise 4.4.41 of [92]. 

Physically, the path representation (54) is also interesting because it provides a probabilistic 
interpretation of the maximum entropy or maximum caliber approach to nonequilibrium systems 
mentioned in the introduction. Indeed, the process obtained from (54) is not only the process that 
minimizes the path relative entropy subject to a constraint - it is the process that gives the rate 
function of At as a result of this constrained minimization, as well as the process that one obtains 
in the long-time limit by conditioning, in the probabilistic sense, the original process Xf on the 
constraint At = a. Some of these links were noted by Evans [75] and by Monthus [80] for specific 
observables of jump processes and Markov chains; see also Appendices B and C of this work. 
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The diffusion matrix is D for both processes, so they are equivalent in the sense of absolute continuity. 
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We give next two proofs of the relative entropy representations (53) and (54): a simple proof 
based on the positivity of the relative entropy, and a slightly more involved proof that has the 
advantage of clarifying the relationship between the generalized Doob transform, the canonical path 
measure, and the maximum caliber method. A third and more direct proof, which uses the level 2.5 
of large deviations, is given in Appendix A3. 

The first proof proceeds from the following limit: 

lim inf t) = 0, (55) 

T^oo u i 

which holds trivially because of the positivity of the relative entropy and the fact that the relative 
entropy is zero for u = Fk. Inserting the expression (21) of in this limit and neglecting 

the end-point terms of this path measure involving r^, which are finite by assumption, we directly 
obtain the relative entropy principle (53) and its constrained version (54) by duality. 

This is by far the simplest proof that we have of the relative entropy principles and, in fact, of 
all the variational representations derived in the previous sections, since these are equivalent to the 
relative entropy representations, as shown in Appendix A3. The simple limit (55) thus provides a 
simple and powerful way to obtain large deviation functions at the path level, and can also be used 
as an alternative method for proving the control results of Sec. IV. 

The second proof of (53) and (54) proceeds differently. It uses the well-known fact that the 
canonical path measure defined in (26) is the unique solution for all T < oo of the following 
variational problem: 

= sup|a:Eq^[At] - ^5(QTrL,r)} , (56) 

where Qr is any path measure (not necessarily Markovian) over the time interval [0, T] [98]. This 
problem is a variant of the Gibbs or Kullback inequality. Assuming that the limit (15) defining the 
SCGF exists, we then obtain [69] 

Afc = lim suplbEQ^lAx]-^S{Q t\\^l,t) 

^ |/cEpcana[^7.] - — ^’(IpCanO | | ^ | , (57) 

The driven process comes into this by noting that the canonical path measures defines a 
non-homogeneous Markov process with time-dependent generator, which becomes asymptotically 
equivalent with the time-homogeneous driven process Xt with drift Fj. in the long-time limit, so 
that 


hm -5(Ei 
T —>-00 I 




-IT) = 0 . 


(58) 


This process equivalence is proved in [2] and implies that the canonical path measure can be replaced 
in (57) by the driven process: both have the same ergodic states and the same asymptotic relative 
entropy relative to Xt, so that (57) is equivalent to (53) with u* = Fk- This is also evident by 
comparing (58) with (55). In the end, the driven process Xt can thus be characterized as the 
homogeneous Markov process closest, in the sense of relative entropy, to the non-homogeneous 
canonical path measure. Because of the equivalence between the canonical and microcanonical 
path measures expressed in (24), it is also the homogeneous Markov process closest, in the sense of 
relative entropy, to the conditioned process W|At = a. 
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E. Spectral eigenfunctions 


We close this section by presenting three other variational representations that characterize the 
spectral elements and Ik rather than the driven process itself. The hrst is obtained by applying a 
change of variables to the variational principle (43) which yields, as shown in Appendix A 4, 

Afc = supy dx (59) 

The maximizer is h* = r^. This is a generalization of a result derived by Nemoto and Sasa for 
jump processes; see Appendix G of [66]. A more direct proof of this result follows by solving the 
maximization using functional derivatives and by finding that h* = Vk is the global maximum. 
The second representation, obtained in the special case g = 0, is 


Ak = stat.pt. / l{x){Ckf){x) dx, (60) 

l,r: f l{x)r{x) dx=l J 

where stat.pt. stands for the stationary point(s) of the expression on the right side, which are 
explicitly I* = Ik and r* = r^. The dual of this result gives I (a) as 


1(a) 


subject to the two constraints 


stat. pt. 

l,r 


l{x){Lr){x) dx 


J l{x)r{x)dx = l and 


J l{x)r{x)f{x) dx = a. 


(61) 


(62) 


This holds again for g = 0 and is solved as before for I* = Ik and r* = Vk- 

The variational principles (60) and (61) are derived in Appendix A 5 from our previous repre¬ 
sentations involving The last one was previously derived using different methods by Eyink 

[63] (see also Symanzik [99]), who refers to it as the action principle generalizing the Rayleigh-Ritz 
principle. We reproduce it here for completeness. Applications of this result have been studied in 
the context of turbulence [65] and a simple Kramers model of nonequilibrium systems [64]. 

The representation (60) is obviously weaker than (59), since it applies only for g = 0 and involves 
an optimization on two functions, compared to one in (59), which does not necessarily yield a 
global maximum. From (59), it is tempting to think that additional conditions (e.g., convexity of 
the rate function) might strengthen (60) and (61) to yield a global maximum or minimum rather 
than a stationary point. We have not been able to obtain any result in that direction. We also do 
not know whether the two representations involving stationary points above can be generalized to 
current-type observables with g ^ 0. 


IV. OPTIMAL CONTROL REPRESENTATIONS 

The variational principles derived in the previous section, especially those expressed in terms of 
the drift u, suggest that the driven process is a control process optimizing functionals related to 
the large deviations of At- We formalize this idea in this section by defining a controlled process 
explicitly and by showing that the functionals optimized have the form of ‘empirical’ or ‘running’ 
costs accumulated over the time interval or horizon [0,T]. Moreover, we show that the so-called 
value function, corresponding to the optimal control cost, satisfies a HJB equation and yields the 
optimal control drift, which converges in the ergodic limit to the driven drift. Relations between 
these results, those of Fleming and collaborators [20-30], and the more recent work of Bierkens, 
Chernyak, Chertkov and Kappen [46, 47] are discussed at the end of the section. 
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A. Controlled process 

We consider the problem of maximizing the cost functional, 

Cj[X-, u] = k [f{X^)ds + g{X^) o dX^] - ^ l\us - F)D-\us - F)(A“) ds (63) 
for the controlled SDE 

dX^ = utdt + a{X^) o dWt, W e (64) 

driven by the control drift ut- The cost function Cj comes from the variational principles of the 
previous section, in particular from (43): the first integral measures with the Lagrange multiplier k 
the cost of reaching the constraint At = a, expressed now over a time-interval [t, T] rather than 
[0,T], while the second integral measures according to (A16) the cost of the control as the relative 
entropy between the controlled process Xf with drift u and the original uncontrolled process Xt 
with drift F. Importantly for the theory, this cost is additive in time and quadratic in the control 
drift Uf. 

Stochastic control theory [41-45] is concerned with determining the optimal control strategy 
that maximizes the expected cost starting with the initial condition Xt = x. The optimal 
expected cost function is called the value function and is denoted here by Aj{x,k). Thus, 

u* = arg sup [Cj'] (65) 

U 

and 

Aj{x,k) = supEp^^^,)^^], (66) 

U 

where the expectation is with respect to the control process X^f started at = x. 


B. Optimal controller 

The solutions of (65) and (66) are obtained from standard results of stochastic control theory 
adapted in Appendix A 6 to control costs containing a displacement or current cost with g 0. 
The control problem involving these costs satisfies the dynamic programming principle of Bellman 
which leads to the following stochastic Hamilton-Jacobi-Bellman (HJB) equation: 

- 9fAf = sup |a:/ + % • u - ^(u - F)D~^{u - F) + ^V ■ [Dg) + LuAJ | (67) 

with final condition A^ = 0. This equation is derived in Appendix A 6; it involves the Markov 
generator of the controlled diffusion Xf defined in (64) and reduces for = 0 to the standard 
stochastic HJB equation. The optimal control law (65) is the maximizer of this equation and is 
given, after inserting the expression of the generator and performing the maximization, by 

u;=F + D{kg + VAf). (68) 

This is a non-homogeneous optimal control law that depends explicitly, as is normal in control 
theory, on the initial and final times of the control horizon [t,T]. Putting g = 0 yields the standard 
solution Ut = F + DVAf [44]. 
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To obtain the long-time limit of these results, we consider the exponential cost Gf (x, k) = 

We show in Appendix A 7 that the new function Gj{x,k) obtained from this Hopf-Cole transform 
satisfies the backward Feynman-Kac equation 

{dt + Ck)Gf = 0 (69) 

with G'^ = 1 and C^, the tilted generator defined in (16). The solution of this equation is obtained 
formally by integrating the semi-group generated by Ck at the final position Xt = y- 

Gf {x, k) = j {x, y) dy. (70) 

Using the assumption that the spectrum of Ck has a gap, denoted by A*,, it can then be shown (see 
[2] for details) that Gf (x, k) behaves exponentially for large T — t according to 

Gf{x,k) = rfc(x)e(^-*)'^''[l + 0(e-(^-*)^'=)]. (71) 

Consequently, 

lim = lim ^ InGf (x, k) = A*. (72) 

T^oo 1 T^oo 1 


and 


so that 


lim VAT(x,k) 

T-^oo t ^ ^ 


lim 

T^oo 


XGT(x,k) 

Gf{x,k) 


'^rkjx) 

rk{x) 


Vln rfc(x), 


(73) 


lim = F + D{kg -|- V In r^) = Fk (74) 

T —>.00 

by (68) and (22). This shows that the driven process is the optimal control process maximizing the 

expectation of the cost Gj as T —>■ oo. To be more precise, it is the optimal process that maximizes 

by (72) the mean expected cost, which converges to A^ in the ergodic limit. 

We can strengthen this result slightly using convergence in probability instead of convergence in 
mean by considering the time-rescaled cost Af /T, which converges according to (72) to the SCGF 
and dominant eigenvalue A^. Thus, 

Ak = lim sup [C^] = lim sup [kAr - Kt] , (75) 

where At is our usual observable evaluated for and 

= ^ j\u - F)D-\u - F){Xf) dt (76) 

is an ‘empirical’ or ‘sample mean’ version of the level-2.5 rate function shown in (44). Noting that 

and yl^“H"i(p)r^Jn,pinv), (77) 

we can then rewrite (75) as 

Ak = lim svip{kAT — Kt}, (78) 

which is now understood as a limit in probability with respect to the law of Xf. The optimal 
controller solving this maximization is u* = Fk- 
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C. Comparison with previous works 

The derivation above follows essentially the theory of Fleming and collaborators [20-30] relating 
linear PDFs to control problems. The main difference is our consideration of a new g term in the 
cost related to observables At that depend on the displacements or increments of Xt in addition to 
the state Xt itself. For 5 = 0, it can be checked that the expected cost appearing in (75) with (76) 
is equivalent to the cost (3.10) obtained by Fleming in [28]. 

Conceptually, our approach is also a reversal of Fleming’s, in that we transform the nonlinear 
HJB PDF (67) into the linear Feynman-Kac PDF (69). Moreover, we provide, as mentioned in the 
introduction, a new probabilistic interpretation of the optimal control process arising in Fleming’s 
theory as the process optimizing the cost Cj". In the ergodic limit, this process is the driven process 
Xt and, by equivalence, the conditioned process Xt\AT = a. 

This not only complements Fleming’s theory, but provides, as also mentioned in the introduction, 
a different approach to the recent work of Bierkens, Chernyak, Chertkov, and Kappen [46, 47], who 
generalize optimal control theory to current-type costs by working directly with ergodic controls 
for the cost (63). More precisely, they consider a variational principle similar to (43) involving a 
control drift u (Problem 2.3 in [46]), which they transform to a variational principle of the type 
(40) for p and j (Problem 3.13 in [46]). From the latter principle, they then deduce an equation for 
the maximizers p* and j*, which they call the HJB equation, and apply a Hopf-Cole transform to 
obtain a linear equation (in Theorem 5.10 of [46]) which is essentially our equation (17) defining 
Tfc and Afc. In doing so, they observe that they generalize the DV characterization of dominant 
eigenvalues of positive operators, but they do not identify the optimal control as the driven or 
conditioned process. 

There are two other connections worth mentioning. The first is with the weak-convergence 
approach of Dupuis and Fllis [97], alluded to in Sec. HID, which is very close conceptually to our 
derivation of the SCGF based on the canonical path measure and the variational problem (56). The 
second connection is with Bierkens and Kappen [100], who solve the optimization problem 

J* = inf{EQ[C]+5(Q||P)} (79) 

for general probability measures Q and P, including path measures of Markov processes, and notice 
that the solution is a canonical measure (see [98] for related results). This is consistent with both 
the relative entropy representation (53) derived in the previous section, which has the form (79) 
and which yields the driven process as the ergodic limit of the canonical path measure, and the 
control results of this section, which express this representation via a drift-controlled process. It 
seems in fact that some of the optimal control processes obtained in [ 100 ] can be expressed as a 
generalized Doob transform similar to (20). 


V. CONCLUSIONS 

We give in Fig. 2 a summary diagram containing the main results of this paper in the center 
and various links between these results, the driven process, and the elements (spectral and large 
deviation) used to construct that process. In order not to overfill the diagram, we include on the 
left-hand side of the diagram links with previous works that directly motivated this paper; more 
links are explained in the text. Our own contributions, which occupy the center and right-hand 
side of the diagram, are 

• to derive general variational principles or representations for and I (a), based on the level 
2.5 of large deviations, generalizing previous representations obtained by Nemoto and Sasa 
[ 66 - 68 ] and by Fyink [63-65] among others; 
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FIG. 2. Summary diagram of our results with important links. Acronyms: PF = Perron-Frobenius, 
FS = Fleming-Sheu, CCBK = Chernyak, Chertkov, Bierkens, Kappen, RR = Rayleigh-Ritz, DV = Donsker- 
Varadhan, H = Holland, E = Eyink, GEK = Girsanov-Eeynman-Kac, DE = Dupuis-Ellis, NS = Nemoto-Sasa; 
see the text for more links and references. 


• to explicitly link these variational principles to the control approach to large deviations 
developed by Fleming [20-30]; 

• to show that the solution of these variational principles and control problems is the driven 
process and, when equivalence holds, the conditioned process. 

We have focused in the previous sections on deriving these principles and explaining how they 
follow from the contraction principle and Laplace principle of large deviation theory. In the 
remaining, we discuss three important applications related to the physics of large deviations in 
nonequilibrium systems and approximations (analytical or numerical) of large deviation functions. 
The discussion is meant to be brief; our goal is to give a few remarks pointing to how the variational 
representations derived here can be used to study nonequilibrium systems and to compute large 
deviation functions describing their fluctuations. We plan to develop each of these remarks, especially 
those related to numerical methods for estimating large deviation functions, more extensively in 
a series of future publications. The full implementation of these methods, and their comparison 
with other methods such as cloning [101-103], represent an important and challenging problem for 
nonequilibrium statistical physics and large deviation theory as a whole. 


A. Physics of nonequilibrium fluctuations 

We have already mentioned in the introduction that our theory of conditioned and driven 
processes can be seen as generalization of the FWG theory [14-16] when applied to time-integrated 
observables. The starting point of both theories is the same: to describe how fluctuations ‘arise’ 
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or ‘are created’ spontaneously from noise. Moreover, both refer to a conditioning problem. In 
the FWG theory, this conditioning selects a single path, called the fluctuation path or instanton, 
because of the large deviation form of the path measure in the low-noise limit [14]. By contrast, in 
our theory the conditioning does not select a single path in general, but a whole set of paths or 
process identified as the driven process Xt in the ergodic limit. 

We have seen another interesting way to characterize the driven process, namely, as a process 
that transforms a fluctuation into a typical state. In general, many different stochastic processes 
can be used to achieve this process transformation or path reweighting. The driven process is 
special among reweightings in that it is the unique homogeneous Markov process closest to the 
original process Xt, with respect to the ‘distance measure’ defined by the relative entropy, which 
transforms Xt to make the fluctuation At = a typical. This provides, as mentioned, a first-principle 
justification of the maximum entropy approach to nonequilibrium systems, which was proposed as 
an ad hoc generalization of the notion of statistical ensembles to these systems [77-79]. 

Many open problems remain about the application of the driven process to study nonequilibrium 
systems beyond calculating large deviation functions. In particular, 

• Can nonequilibrium systems be represented as a conditioning of equilibrium systems? Con¬ 
sider, to be more precise, an extended many-particle system driven at its boundary by particle 
or energy reservoirs. Can this system be mapped, exactly or approximately, to an equilibrium 
system conditioned on some observable (e.g., the current)? For what class of systems and 
observables is this mapping possible? 

• Can we obtain the FWC theory as the low-noise limit of the driven process? In other words, 
for which class of processes and observables is the low-noise limit of driven process the adjoint 
deterministic dynamics predicted by the FWC theory? 

These problems can be formulated, interestingly, in control terms. The first one, which was 
proposed by Evans [74-76] and which served as a direct motivation for our work on large deviation 
conditioning, can be rephrased by asking whether the solution of the optimal control problem of 
Sec. IV, defined for an initial equilibrium forcing F, is a nonequilibrium process controlled in the 
stationary limit by boundary fields or forces. The second problem relates, on the other hand, to 
the low-noise limit of stochastic control problems and viscosity solutions of HJB equations, two 
problems that have been studied extensively in control theory; see [44]. 

Many other problems of nonequilibrium statistical mechanics can be formulated similarly by 
appealing to control theory, so we expect the approach and links presented here to be useful 
for solving them. Of particular importance are applications to interacting systems, such as the 
exclusion process and the zero-range process, which have been studied recently in terms of canonical 
and grand-canonical versions of current conditioning in [104-107]. For more applications, see the 
references cited in Sec. 6 of [2], and [108]. 


B. Variational approximations 

The variational principles derived here are generally not easy to solve, since they involve spectral 
elements that are difficult to obtain analytically or numerically and require, in some cases, the 
determination of stationary distributions of nonequilibrium systems. However, it is possible to 
approximate their solutions, as commonly done in optimization theory, by restricting or projecting 
the possible minimizers or maximizers on specific classes of functions. 

The derivation of these approximations is also a control problem: one restricts the optimal control 
problem to a specific class of controllers, called a control design. For example, one can restrict the 
variational principles involving the drift u to drifts that are linear in the state Xt or that are gradient. 
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so as to obtain tractable approximations of large deviation functions or dominant eigenvalues. In 
physics, the idea of control design is equivalent to proposing ansatz, basis functions, trial solutions or 
trial wavefunctions to optimization problems such as the Rayleigh-Ritz principle. For applications 
of these approximations for jump processes and diffusions, see [40, 64, 67, 70, 91, 109-111]. 

The quality of the approximations obtained from ‘limited’ or ‘suboptimal’ control designs depends 
on the system and observable considered, and is essentially given by the relative entropy of the 
‘true’ control solution and the ‘approximate’ control solution obtained for a given design. Moreover, 
because of the maximum or minimum involved in the variational representations, one obtains not 
just an approximation of the true large deviation function or eigenvalue considered, but a lower 
bound (in the case of ‘sup’ principles) or an upper bound (in the case of ‘inf’ principles), which can 
only be improved by enlarging the class of controls considered. This monotonicity property is very 
useful in practice to determine the convergence and quality of approximations without the prior 
knowledge of ‘true’ solutions. 


C. Numerical algorithms for large deviation functions 


The driven process can be used in three different ways to compute large deviation functions 
numerically: 

• Implement the variational approximations described before using specific control designs or 
trial ansatz leading to bounds on SCGFs and rate functions. For applications of this method, 
see [64, 109]. 

• Use control and dynamic programming techniques to solve the HJB equation (67) giving 
the finite-time value function and the SCGF in the ergodic limit. This method has 
not been used yet for studying time-integrated observables of nonequilibrium systems; see 
[40, 110, 111] for related equilibrium applications in the low-temperature (FWG) limit. 

• Use the driven process as a change of measure in importance sampling. 

The idea of the last method is to simulate the driven process Xt rather than the original process 
Xt so as to estimate the probability 


^l,t{At = a} = Ep^ ^,[<5(At - a)] 


using 


^l,t{At 


,T 

^ k 


5 {Ax — a) 


dFL,T 

d^Lp^,T 


(80) 


(81) 


The extra factor in the latter expectation is the Radon-Nikodym derivative of Xt with respect to 
Xt which corrects for the change of process [112]. The advantage of using Xt for estimating (81) 
is that it makes the event Ax = a typical for some properly chosen value k, as we know from the 
previous sections. This means that it is an efficient process for estimating large deviations: contrary 
to Xt, it does not require an exponentially large sample (in T) to compute the exponentially small 
probability Fl,x{Ax = a} and its corresponding rate function I{a) [112-115]. 

That this property of the driven process can be used in practice appears questionable at first, 
since this process is constructed from and A*,, the very elements needed to obtain I{a). Recent 
works have shown, however, that it is possible to construct and A^, and therefore Xt and I (a), 
iteratively or adaptively without any prior knowledge of these functions and process by combining 
sampling and spectral methods. Examples of such adaptive methods have been developed by the 
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group of Borkar [116, 117] and by Nemoto and Sasa [68]. They fall conceptually in the class of 
adaptive importance sampling methods [112], which can be seen as a form of feedback control. There 
is great scope in applying these methods in physics to estimate large deviation functions, to study 
nonequilibrium systems, and to establish further links between these systems, rare event simulations, 
and control theory. 


Appendix A: Proofs 
1. Optimal p* and j* 


The minimizer of (40) can be obtained by introducing Lagrange multipliers for the 

constraints V • j = 0 and f p{x)dx = 1. We thus consider the variations 


and 


_ 5 _ 

5p{a) 


kA{pJ) -K{p,j) 


A(x)V • j{x) dx — p, 


p{x) dx 


(Al) 


(5 

^j{a) 


kA{pJ) - Kip,j) 


A(x)V • j{x) dx — p 


p{x) dx 


(A2) 


which yield the equations for the minimizer {p*,j*) 

0 = kf{a) + ^ip*)~'^[j*{a) - JF,p*ia)]D~^{a)\j*{a) - JF,p*(a)] 

+{p*D)-\a)[f{a) - JF,p*{a)] • F{a) + ^ [V • - Jf,p*))] (a) - p (AS) 

and 


0 = kg{a) - {p*D)-\a)[j*ia) - JF,p.(a)] + VA(a), 


(A4) 


respectively. 

The second equation can be rewritten as 


j*{a) = Jf,p*{o) + {P* D){a)[kg{a) + VA(a)] 

= (-^(«) + D[kg{a) + VA(a)]) p*{a) - ^Vp*{a) 

= «^F+D(fc3+VA),p*(«) (A5) 

and implies with the constraint V • j* = 0 that 

P* = PF+D{kg+VX)^ (A6) 

which is normalized by assumption. The first equation (AS), on the other hand, can be rewritten as 


P= kf + ^{kg + VX)D{kg + VA) + {kg + VX)F + • [D{kg + VA)] 


(A7) 


This has the form p = e~^{Cke^) with Ck given as in (16), so we identify as the Perron- 

Frobenius eigenvector of Cp, since > 0 and Ck has only one positive eigenvector, and p = Ak as 
its dominant eigenvalue. From equations (A5) and (A6) and the definition (22) of the modified 
drift Fk, we then obtain 


_ „inv 

P - PFu^ 


j* — Jfu 




(A8) 


with = rklk- 
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2. Donsker-Varadhan result for g = 0 


In the case g = 0, the variational representation (40) reduces by direct minimization on j to the 
DV variational principle [52], which is in our notations 


Afc = sup i / kf{x)p{x)dx - hip)} , (A9) 

p:f p(x)dx=l w J 

where 

-^ 2 (/o) = inf/(/9, j) = — inf [ pix)ih~^Lh)ix) dx (AlO) 

j h>o J 

is the rate function of the empirical density pT for the Markov process with generator L, first 
derived by Donsker and Varadhan and now often referred to as the level-2 rate function [34-36] . 
This rate function is known to be explicitly given by 

= (All) 

when Xt is reversible with respect to the invariant density p^fP', that is, p'fi^Lip^h^)~^ = LP As a 
particular case, if L is hermitian, that is, if Xt is reversible with respect to the constant (Lebesgue) 
density, then (A9) reduces to 


Afc = sup 

p:f p{x)dx=l 


dx 


kf{x)p{x) + p^/‘^{x)iLp^/‘^)ix) 


sup < j dx [A:/(x)(T^(x) + (t(x)(-L(t)(x)] 

ct: j <j'^{x)dx=l 

sup <j / dxaix){Cka)ix)\ , 

cr:f o-^(x)dx=l 


(A12) 


which is the Rayleigh-Ritz variational principle for [51, 94]. 


3. Relative entropy representations via level-2.5 large deviations 


The following representation of the level-2.5 rate function can be derived from a result of Barato 
and Chetrite [61]: 


Kip,j) 


hm i5(Pi,,TrL,r) ifV-j = 0 

T—>-oo 1 

oo otherwise. 


(A13) 


where PLii,T is the path measure of a modified diffusion with drift ft, chosen in such a way that the 
typical behavior of (ptj jT) in the T —)• oo limit is (p, j); that is, u is such that 

iPT,jT) (pr, = ip,j) (A 14 ) 

or explicitly by (9), 

?■ D 

u = — I -Vlnp. (A15) 

p 2 
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With the change of variables (/9,j) —)> ip,u) with Ju^p = j introduced before in (43), we then get 
u = u and 


4,pinv) = hm ^S{FL^,Tm,T) (A16) 

iru T^OO 1 

for the function , Ju,p}^'^) shown in (44). 

The representation (A13) can be substituted directly into (40) and (49) to obtain representations 
for Afc and I {a), respectively, involving {p,j) and the drift u defined in (A14). Similarly, we can 
substitute the more explicit formula (A16) above into (43) and (51) to obtain relative entropy 
representations of A^ and /(a), respectively, which are now explicit in u. From these, we obtain the 
final representations (53) and (54) announced in Sec. Ill D by noting that the limit in probability 
(A14) also implies a limit in mean, so that 


i(/9: 


inv 
'u 1 


I 


) = lim Ep [At] = hm Ep [i(pT,jT)] 


T^oo 


T^oo 


(A17) 


From this proof, it is clear that the variational representations of A^ and I (a) expressed via 
{p*,j*) and u are equivalent to the relative entropy representations of these functions. It is also 
clear that the solution {p*,j*) of the Laplace principle (40) or the contraction principle (49) is the 
typical value of {pT, Jt) m the driven process, as noted before in (42): the solution of (53) or (54) 
is u* = Ffc, so that u{p*,j*) = u*, implying (42) from (A14). 


4. Eigenfunction representation (59) 


The modified variational principle (59) follows from the variational principle (43) involving the 
drift u by performing the contraction u ^ h with 

u = F + D{kg+ V In h). (A18) 

Since the maximum of (43) has this form, we do not restrict this result by rewriting it with (9) as 


Afc — P^F+D{kg+\/lnh) 


k 1 

kf + kg ■ F + -V ■ (Dg) +-{kg+ V In h)D{kg - V In h) 


(A19) 


The term in brackets can be expressed as 


kf + kg-F+^V ■{Dg) + ]^{kg + V\nh)D{kg-V\nh) = h-^{Ckh) - (T^lnh), (A20) 


where 


dl = h-^Ckh-h-^{Ckh) (A21) 

is the generalized Doob h-transform of Cj. [2]. Similarly to (20), this transform defines a new 
diffusion with drift u given by (A18) and stationary density Consequently, 

y ^ J P^F+D{kg+\/In h)) ^ (A22) 

so that (A19) combined with (A20) reduces to (59). 
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5. Eigenfunction representations (60) and (61) 

The representation (61) for the rate function can be derived from the representation (49) involving 
{p,j)- In the case g = 0, 


MpJ) 


dxp{x)f{x), 


(A23) 


so that (49) becomes 


I{a)= inf inf K{p,j) 

j p{x)dx = l J 
J p{x) f (x)dx = a 


inf 

f p(x)dx^l 
f p{x) f (x)dx=a 


h{p), 


(A24) 


where l 2 ip) is the level-2 rate function. From the expression of this function shown in (AlO), we 
thus get 


1(a) = inf /— inf [ p(x)(h ^Lh)(x)\ . (A25) 

Sp(x)dx = l {h>oJ J 

J p(^x) f (x)dx=a 

To obtain (61), we then only need to perform the following change of variables: 

{h, p) (r, 1) = {h, ph~^). (A26) 

It is not clear whether this change of variables preserves the minimal nature of h in (A25); hence 
the transformation of the infimum in (A25) to a stationary point optimization in (61). 

The same argument applied to (40) yields (60) knowing that Ck = L + kf for 5 = 0. In both 
cases, the solution [r*,1*) = (xkdk), or equivalently {h*,p*) = {rk,rklk), is obtained by solving the 
infimum on h in (A25) for the known solution p* = r^lk. 


6. Modified HJB equation for current costs 


We want to derive the HJB equation for the stochastic control problem on the finite horizon 
[t, T] involving the controlled diffusion A“ defined in (64) and the following cost function; 


Cl{x) = infEp^^, 


V^(A:“, us)ds + (piX^) o dXl 


Ut 


(A27) 


where the expectation is with respect to the law of the controlled process started at X^ = x. 
The part involving ijj is the usual cost considered in optimal control theory; the added current or 
Stratonovich cost involving (p has been considered only recently by Chernyak, Chertkov, Bierkens 
and Happen [47] who give a partial solution for quadratic costs in the control drift ut- 

More general results can be obtained in a very simple way by showing that the term p can be 
absorbed in ip to rewrite C'l as 


Cjix) = infEp^^ J, 



(A28) 


with the modified cost 


'ip{x, u) = ipix, u) + p{x) • u{x) H—V • {Dp{x)). 


(A29) 
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This follows by noting that 


Ep 


Lu,T 


j%{X^,Us)ds + ct>{X^)odXl 


dy i}{y,Us){PLSt{x,y)ds + (l){y )-, (A30) 


where 


(PiJ*(x,2/) = e(-*)^“(x,y) (A31) 

is the transition probability for the controlled diffusion between Xt = x and Xg = y with s > t, 
and Ju,{PL„)l{x,-){y) is the associated probability current defined in (9). Inserting this definition in 
(A30) and using integration by parts yields V' as above. 

In the purely additive form involving in "0, the cost Cj(x) now satisfies the usual backward HJB 
equation 


- dtC[(x) = ini{'ij){x, u) + L^Cj' (x)} 


(A32) 


with = 0. Given (A29), we therefore obtain 


- dtC^ (x) = inf <j 'ijj{x, u) + 4>{x) ■ u{x) + -V • {D4>{x)) + LuCl {x) 


{A33) 


It can be checked that this recovers the results of [47] for quadratic costs. 


7. Control representation 


The PDE satisfied by the exponential cost Gf (x, k) = is obtained by inserting the 

solution (68) for u* in the HJB equation (67) and by using the expression of the generator 


dtGj 

GI 


= kf + ^V- (Dg) + l[kg + 


vcn 
GT J 


D \^kg + 

,^GT\ 


vcn 

GT ) 


+ F ■ [kg + 


VGT\ , 1^ DVGJ 


GT ) 


+ xV 


GT 


= kf + -v- (Dg) - -{Dg) ■[kg + 2-^ j + F • (^% + + -V • (A34) 


XGT\ , 1 ^ dvgT 

2 


GT • 


In these equations, all gradients are in x. Consequently, 
dtGT + k + -V • (Dg) + + F ' 9^ GT + (^F + kDg^ ■ 

which can be rewritten with the tilted generator Ck (16) as 

{dt + Fk)GT = 0 , GT = 1 , 


VGT + \v-^^ = Q, (A35) 


(A36) 


as claimed in (69). 
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Appendix B: Jump processes 

We translate in this section the results of the previous sections in the language of pure jump 
processes. The notations follow as before those of [2], 

We consider an ergodic pure jump process Xt, t G [0, T], dehned by the transition kernel W{x, y) 
representing the transition rate (probability density per unit time) for the transition going from x 
to y. The generator L of this process is expressed in terms of W{x^y) as 

Lh{x) = j dyW{x,y)[h{y)-h{x% (Bl) 

where h{x) is a bounded measurable function on the space of We also define from W{x,y) 

the escape rates 

X{x) = j W{x,y)dy = {Wl){x). (B2) 

With these elements, it is then common to express the generator as L = W — \. 

For jump processes, the general observable At defined in (12) must be modihed to account for 
the fact that paths of these processes have discontinuities. This leads us to consider 

At = ^ r fiXt)dt + ^ g{Xt-,Xt+), (B3) 

where the sum is over all times t at which a jump occurs with state Xj-- before the jump and 
X^+ after the jump. The choice of functions g{x,y) and f{x) depends on the application or 
physical observable considered. Choosing / = 0 and g{x,y) = 1, for example, gives the number of 
jumps per unit time occurring in [0,r], which is called the activity [58, 88, 89], while / = 0 and 
g{x,y) = —g{y,x) = 1 gives the current per unit time [58, 61]. 

The large deviations of At can be determined similarly as for diffusions from the SCGF A^,, 
which corresponds to the largest eigenvalue of the tilted generator 

Ck = -X + kf, (B4) 

where the first term on the right-hand side of the eqnation is nnderstood as the Hadamard 
component-wise product [2]. As before, we denote the eigenfunction corresponding to by and 
the dual eigenfunction by 4. The driven process associated with the conditioned process Xt \ At = a 
is then defined exactly as in (20) by a generalized Doob transform of Ck, which yields for jump 
processes the driven generator Lk = Wk — Xk involving the driven rates 

Wk{x,y) = r^^{x)W{x,y)e’^<^^^'y'>rk{y) (B5) 

and the driven escape rates Xk{x) = (B41)(x) [2]. 

The equivalence results expressed by (24) or (25) also hold with the path measure replaced 

by the path measure ^Wk,T of the jump process with rates Wk and imply, similarly to the diffusion 
case, that the driven jump process Xt is equivalent to the conditioned jump process Xt\AT = a at 
the level of stationary states. In particular, both processes have the same empirical density Pt{x) 
in the limit T —?• oo, which converges to p]^(x) = rk{x)lk{x). They also have the same asymptotic 
empirical current, which in the case of jump processes is defined as 

JT{x,y) = CT{x,y) - CT{y,x), (B6) 


12 


Integrals must be replaced by sums in this appendix if Xt lives in a discrete space. 
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where 

CT{x,y) = ^ ^ 6{X^--x)5{X,+ -y) (B7) 

is the so-called empirical flow [60] corresponding to the number (per unit time) of jumps from x to 
y. The latter quantity converges in the driven jump process to 

y) = PWk^x)Wk{x, y). (B8) 


Therefore, 


Jt ^ 


Wk,P^] 


ix,y) 


PWki^Wk{x,y) - Wk{y,x)p^^^{y). 


For I {a) convex, we then also have 


(B9) 


Pt 



mv 

PWw 






(BIO) 


in conditioned jump process Xt\AT = a for A: = I'{a). 

These results are essentially the same as for diffusions, except for the definition of the empirical 
current. For building the contraction of At-, we also need the pair (p^, Ct) rather than (p^, Jt)- 
As a function of pT and Ct, we indeed have 


A{pt,Ct) = j f{x)pTix)dx + j g{x,y)CT{x,y)dxdy. 


(Bll) 


Moreover, (pT, Ct) is known to satisfy an LDP with rate function 

K{p,C)= f dxdy (C{x,y)ln - C{x,y) + p{x)W{x,y) 


p{x)W{x,y) 


if 


C{x,y)dy = j C{y,x)dy 


(B12) 


(B13) 


for all X and /(p, C) = oo otherwise. More detail about these results can be found in [58, 60, 61]. 

From here, we translate our results of the previous sections as follows. 

First, we obtain a jump version of the first variational representation (40) using Bt = {pT, Ct) 
and the rate function K{p, C) in (B12) to get 

Afc = sup{/i:A(p, C) - Ar(p, C)}. (B14) 

p,C 

It is understood that the minimization is over all densities p such that f p(x)dx = 1 and balanced 
flows C satisfying (B13). The minimizer, as expected and as can be checked explicitly, is p* = 
and C*{x,y) = p'^^{x)Wk{x,y). 

Second, we can re-parameterize the minimization in (B14) in terms of a transition matrix 
that determines the ergodic limit of both pT and Ct to obtain jump version of the variational 
representation (43). To be more precise, let us denote by pg'" the invariant density of an ergodic 
jump process with transition rate matrix Q(x, y), and let Cg ^mv be the invariant empirical flow (B8) 
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obtained with the same transition matrix. Given the change of variables C ^ Q with C = 
the constraint (B13) then implies p = Pg'", so that 


— sup I 74 (Pq'', Cqpiu^) K{Pq^, Cqpinv)'^ 


(B15) 


where 


= / dxdyp^^^{x) 


Qix,y) In 


Qix,y) 

Wix,y) 


-Q{x,y) + W{x,y) 


(B16) 


is the level-2.5 rate function (B12) expressed with the flow (B8) and transition matrix Q. In this 
form, it can be checked that the minimizer Q* is Wk, the transition rate matrix of the driven jump 
process. 

Third, the representation (49) of the rate function I (a) becomes by considering the constrained 
maximization (34) with Bt = {pt,Ct) and the rate function K{p,C): 

I{a)= inf K{p,C), (B17) 

{p,C):A(p,C)=a 


which can be re-written as 


I{a) 


inf 


Q,Pr 


/ )—^ 


(B18) 


using the same re-parameterization as before. 

Fourth and hnally, we can consider the jump process xf as being controlled by a choice of 
transition rates Q{x,y) to rewrite all these representations in control form. The main result worth 
noting in this case is 


Afc = lim sup{A:Ar — Kt} 

T^OO Q 


(B19) 


for almost all paths, where 






(B20) 


CAxf/O 


is the value of the observable obtained with respect to the Q-controlled process x9 and 


Kt = — 
T 


£ ldy[Q{X^,y)-W{X^,y)] + ^ Y In 


Q{X^.,Xli) 


Q' 

^ 

t-.AX^^O W ’ t+J 


(B21) 


is the empirical version of the rate function shown in (B16). Equation (B19) is the jump analog of 
our control result (78). It was obtained before in mean form by Jack and Sollich [70]. 

We do not translate the variational representations involving the relative entropy, since they are 
obtained similarly as for diffusions. These representations rationalize in terms of large deviation 
functions the results derived for jump processes by Monthus [80], who refers to the limit of the 
path relative entropy as the Kolmogorov-Sinai entropy. Some connections that exist between the 
driven process, the maximum caliber method, and the effective transition rates introduced by Evans 
[74-76] (see [2] for more details) are also mentioned in [80]. 
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Appendix C: Markov chains 

The translation of onr results for an ergodic Markov chain {Aj}^g with transition matrix 
M{x,y) closely follows the previous appendix and the Appendix E of [2], so we shall be brief in 
this section. The quantities and concepts to consider are as follows: 


Observable: 


1 ^-1 

^ A/ ^ 5(A'j, Aj+i), 


(Cl) 


i=0 


where g is an arbitrary function. This observable includes one-point observables by choosing 
g{x,y) = f{x). 

Tilted matrix: The SCGF is now the logarithm of the dominant eigenvalue of the matrix 

Mk{x,y) = Mix,y)e^9{x,y)_ (^2) 

As before, we denote by the associated eigenvector of Aik and by 4 the eigenvector of the 
dual (transpose) of Aik, which is nothing but the left eigenvector of Aik- 

Driven Markov chain: The discrete-time version of the driven process is the Markov chain 
with modified transition probabilities 

Mk{x,y) = r^^{x)Aik{x,y)rk{y) (C3) 

The stationary density of this process is also p^M^ix) = rk{x)lk{x)] see Appendix E of [2]. 

Empirical density: 


N-l 


Pi,Nix) = 


(C4) 


i=0 


where 5x,y is the Kronecker symbol. 
Pair empirical density: 


AT-l 


P 2 ,Nix,y) = <^3; A • 


(C5) 


i=0 


This is the Markov chain analog of the empirical flow CTix,y). 


Contraction: The empirical density is not needed to obtain a representation of Atv, as defined 
in (Cl); the pair empirical density is sufficient: 


Mp 2 ) = / 9 ix,y)p 2 ix,y)dxdy. 


(C6) 


LDP for the representing observable: The pair empirical density is known to satisfy an LDP 
with rate function [17, 85, 93] 


Kip2) = 


dxdy p 2 ix,y)ln 




OO 


otherwise. 
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• Typical asymptotic states in the driven Markov chain: P 2 ,N{x,y) converges in the limit 
N ^ oo to the stationary joint distribution of the ergodic Markov chain considered. For the 
driven process, we thus have 

P2,N{x,y) p'Mki^)Mk{x,y) = lk{x)Mkix,y)rk{y)e~^'^. (C8) 

By contraction, this also implies 


Pi,n{x) 




(C9) 


for the driven Markov chain. 

• Equivalence with conditioned Markov chain: Assuming that Ajsf satisfies an LDP with convex 
rate function I {a), we have the same two limits above for the conditioned Markov chain 
Xn\AN = a with k = P{a). 

The results of the previous sections are expressed in terms of these notations with minor changes. 
We only note the variational formula 

Afc = sup{/cA(p2) - K{p2)}, (CIO) 

P2 

which derives from the Laplace principle (36), its transition matrix version 

Afc = sup{/ci(p(^'' (8) C) - K{p^^^ (g) Q)}, (Cll) 

Q 


and the control representation 


Afc = lim sup{A;A 7 v — ATjv}- (C12) 

N^OO Q 

The last variational representation involves the observable 

N-l 

= (C13) 

i=0 


accumulated by the controlled Markov chain transition matrix Q{x,y) and the 

empirical version of the relative entropy: 


Kn 


1 

N 


N-l 






(C14) 


The representation (CIO) is the Markov chain analog of (40) and (B14), while (Cll) is the Markov 
chain analog of (43) and (B15). The latter result, expressed explicitly with the contraction (C6) 
and rate function (C7), was previously derived by Sasa [118]. 

Markov chain versions of the representations involving the rate function and the relative entropy 
follow similarly. As for the jump process case, they rationalize from the large deviation point of 
view the results of Monthus [80] . 
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